Measures of semantic similarity and relatedness in the biomedical domain

نویسندگان

  • Ted Pedersen
  • Serguei V. S. Pakhomov
  • Siddharth Patwardhan
  • Christopher G. Chute
چکیده

Measures of semantic similarity between concepts are widely used in Natural Language Processing. In this article, we show how six existing domain-independent measures can be adapted to the biomedical domain. These measures were originally based on WordNet, an English lexical database of concepts and relations. In this research, we adapt these measures to the SNOMED-CT ontology of medical concepts. The measures include two path-based measures, and three measures that augment path-based measures with information content statistics from corpora. We also derive a context vector measure based on medical corpora that can be used as a measure of semantic relatedness. These six measures are evaluated against a newly created test bed of 30 medical concept pairs scored by three physicians and nine medical coders. We find that the medical coders and physicians differ in their ratings, and that the context vector measure correlates most closely with the physicians, while the path-based measures and one of the information content measures correlates most closely with the medical coders. We conclude that there is a role both for more flexible measures of relatedness based on information derived from corpora, as well as for measures that rely on existing ontological structures.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures

Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...

متن کامل

Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures

Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...

متن کامل

Evaluating measures of semantic similarity and relatedness to disambiguate terms in biomedical text

INTRODUCTION In this article, we evaluate a knowledge-based word sense disambiguation method that determines the intended concept associated with an ambiguous word in biomedical text using semantic similarity and relatedness measures. These measures quantify the degree of similarity or relatedness between concepts in the Unified Medical Language System (UMLS). The objective of this work is to d...

متن کامل

Evaluating semantic similarity and relatedness over the semantic grouping of clinical term pairs

INTRODUCTION This article explores how measures of semantic similarity and relatedness are impacted by the semantic groups to which the concepts they are measuring belong. Our goal is to determine if there are distinctions between homogeneous comparisons (where both concepts belong to the same group) and heterogeneous ones (where the concepts are in different groups). Our hypothesis is that the...

متن کامل

tESA: a distributional measure for calculating semantic relatedness

BACKGROUND Semantic relatedness is a measure that quantifies the strength of a semantic link between two concepts. Often, it can be efficiently approximated with methods that operate on words, which represent these concepts. Approximating semantic relatedness between texts and concepts represented by these texts is an important part of many text and knowledge processing tasks of crucial importa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of biomedical informatics

دوره 40 3  شماره 

صفحات  -

تاریخ انتشار 2007